--- title: IceVision Bboxes - CycleGAN Data keywords: fastai sidebar: home_sidebar nb_path: "nbs/iv_bbox_fake2.ipynb" ---
{% raw %}
{% endraw %}

This is a mashup of IceVision's "Custom Parser" example and their "Getting Started (Object Detection)" notebooks, to analyze SPNet Real dataset, for which I generated bounding boxes. -- S.H. Hawley, July 1, 2021

Installing IceVision and IceData

If on Colab run the following cell, else check the installation instructions

{% raw %}
#try:
#    !wget https://raw.githubusercontent.com/airctic/icevision/master/install_colab.sh
#    !chmod +x install_colab.sh && ./install_colab.sh
#except:
#    print("Ignore the error messages and just keep going")
{% endraw %} {% raw %}
 
{% endraw %} {% raw %}
import torch, re 
tv, cv = torch.__version__, torch.version.cuda
tv = re.sub('\+cu.*','',tv)
TORCH_VERSION = 'torch'+tv[0:-1]+'0'
CUDA_VERSION = 'cu'+cv.replace('.','')

print(f"TORCH_VERSION={TORCH_VERSION}; CUDA_VERSION={CUDA_VERSION}")
print(f"CUDA available = {torch.cuda.is_available()}, Device count = {torch.cuda.device_count()}, Current device = {torch.cuda.current_device()}")
print(f"Device name = {torch.cuda.get_device_name()}")
TORCH_VERSION=torch1.8.0; CUDA_VERSION=cu102
CUDA available = True, Device count = 1, Current device = 0
Device name = TITAN X (Pascal)
{% endraw %} {% raw %}
#!pip install -qq mmcv-full=="1.3.8" -f https://download.openmmlab.com/mmcv/dist/{CUDA_VERSION}/{TORCH_VERSION}/index.html --upgrade
#!pip install mmdet -qq
{% endraw %}

Imports

As always, let's import everything from icevision. Additionally, we will also need pandas (you might need to install it with pip install pandas).

{% raw %}
from icevision.all import *
import pandas as pd
INFO     - The mmdet config folder already exists. No need to downloaded it. Path : /home/shawley/.icevision/mmdetection_configs/mmdetection_configs-2.10.0/configs | icevision.models.mmdet.download_configs:download_mmdet_configs:17
{% endraw %}

Download dataset

We're going to be using a small sample of the chess dataset, the full dataset is offered by roboflow here

{% raw %}
#!rm -rf  /root/.icevision/data/espiownage-cyclegan
{% endraw %} {% raw %}
#data_url = "https://hedges.belmont.edu/~shawley/spnet_sample-master.zip"
#data_dir = icedata.load_data(data_url, 'spnet_sample') / 'spnet_sample-master' 

# can use public espiownage cyclegan dataset:
#data_url = 'https://hedges.belmont.edu/~shawley/espiownage-cyclegan.tgz'
#data_dir = icedata.load_data(data_url, 'espiownage-cyclegan') / 'espiownage-cyclegan'

# or local data already there:
from pathlib import Path
data_dir = Path('/home/shawley/datasets/espiownage-fake')
{% endraw %}

Understand the data format

In this task we were given a .csv file with annotations, let's take a look at that.

!!! danger "Important"
Replace source with your own path for the dataset directory.

{% raw %}
df = pd.read_csv(data_dir / "bboxes/annotations.csv")
df.head()
filename width height label xmin ymin xmax ymax
0 steelpan_0000000.png 512 384 8 135 110 322 287
1 steelpan_0000000.png 512 384 4 399 4 462 103
2 steelpan_0000000.png 512 384 2 20 132 79 211
3 steelpan_0000000.png 512 384 4 353 175 504 254
4 steelpan_0000000.png 512 384 6 75 34 162 105
{% endraw %}

At first glance, we can make the following assumptions:

  • Multiple rows with the same filename, width, height
  • A label for each row
  • A bbox [xmin, ymin, xmax, ymax] for each row

Once we know what our data provides we can create our custom Parser.

{% raw %}
set(np.array(df['label']).flatten())
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
{% endraw %} {% raw %}
df['label'] /= 2
#df.head()
df['label'] = df['label'].apply(int) 
print(set(np.array(df['label']).flatten()))
df['label'] = "_"+df['label'].apply(str)+"_"
{0, 1, 2, 3, 4, 5}
{% endraw %} {% raw %}
df.head()
filename width height label xmin ymin xmax ymax
0 steelpan_0000000.png 512 384 _4_ 135 110 322 287
1 steelpan_0000000.png 512 384 _2_ 399 4 462 103
2 steelpan_0000000.png 512 384 _1_ 20 132 79 211
3 steelpan_0000000.png 512 384 _2_ 353 175 504 254
4 steelpan_0000000.png 512 384 _3_ 75 34 162 105
{% endraw %} {% raw %}
df['label'] = 'AN'  # antinode
df.head()
filename width height label xmin ymin xmax ymax
0 steelpan_0000000.png 512 384 AN 135 110 322 287
1 steelpan_0000000.png 512 384 AN 399 4 462 103
2 steelpan_0000000.png 512 384 AN 20 132 79 211
3 steelpan_0000000.png 512 384 AN 353 175 504 254
4 steelpan_0000000.png 512 384 AN 75 34 162 105
{% endraw %}

Create the Parser

The first step is to create a template record for our specific type of dataset, in this case we're doing standard object detection:

{% raw %}
template_record = ObjectDetectionRecord()
{% endraw %}

Now use the method generate_template that will print out all the necessary steps we have to implement.

{% raw %}
Parser.generate_template(template_record)
class MyParser(Parser):
    def __init__(self, template_record):
        super().__init__(template_record=template_record)
    def __iter__(self) -> Any:
    def __len__(self) -> int:
    def record_id(self, o: Any) -> Hashable:
    def parse_fields(self, o: Any, record: BaseRecord, is_new: bool):
        record.set_filepath(<Union[str, Path]>)
        record.set_img_size(<ImgSize>)
        record.detection.set_class_map(<ClassMap>)
        record.detection.add_labels(<Sequence[Hashable]>)
        record.detection.add_bboxes(<Sequence[BBox]>)
{% endraw %}

We can copy the template and use it as our starting point. Let's go over each of the methods we have to define:

  • __init__: What happens here is completely up to you, normally we have to pass some reference to our data, data_dir in our case.

  • __iter__: This tells our parser how to iterate over our data, each item returned here will be passed to parse_fields as o. In our case we call df.itertuples to iterate over all df rows.

  • __len__: How many items will be iterating over.

  • imageid: Should return a Hashable (int, str, etc). In our case we want all the dataset items that have the same filename to be unified in the same record.

  • parse_fields: Here is where the attributes of the record are collected, the template will suggest what methods we need to call on the record and what parameters it expects. The parameter o it receives is the item returned by __iter__.

!!! danger "Important"
Be sure to pass the correct type on all record methods!

{% raw %}
class BBoxParser(Parser):
    def __init__(self, template_record, data_dir):
        super().__init__(template_record=template_record)
        
        self.data_dir = data_dir
        self.df = pd.read_csv(data_dir / "bboxes/annotations.csv")
        #self.df['label'] /= 2
        #self.df['label'] = self.df['label'].apply(int) 
        #self.df['label'] = "_"+self.df['label'].apply(str)+"_"
        self.df['label'] = 'AN'  # make them all the same object
        self.class_map = ClassMap(list(self.df['label'].unique()))
        
    def __iter__(self) -> Any:
        for o in self.df.itertuples():
            yield o
        
    def __len__(self) -> int:
        return len(self.df)
        
    def record_id(self, o) -> Hashable:
        return o.filename
        
    def parse_fields(self, o, record, is_new):
        if is_new:
            record.set_filepath(self.data_dir / 'images' / o.filename)
            record.set_img_size(ImgSize(width=o.width, height=o.height))
            record.detection.set_class_map(self.class_map)
        
        record.detection.add_bboxes([BBox.from_xyxy(o.xmin, o.ymin, o.xmax, o.ymax)])
        record.detection.add_labels([o.label])
{% endraw %}

Let's randomly split the data and parser with Parser.parse:

{% raw %}
parser = BBoxParser(template_record, data_dir)
{% endraw %} {% raw %}
train_records, valid_records = parser.parse()
INFO     - Autofixing records | icevision.parsers.parser:parse:136
{% endraw %}

Let's take a look at one record:

{% raw %}
show_record(train_records[5], display_label=False, figsize=(14, 10))
{% endraw %} {% raw %}
train_records[0]
BaseRecord

common: 
	- Filepath: /home/shawley/datasets/espiownage-fake/images/steelpan_0001026.png
	- Img: None
	- Record ID: 1026
	- Image size ImgSize(width=512, height=384)
detection: 
	- Class Map: <ClassMap: {'background': 0, 'AN': 1}>
	- Labels: [1, 1, 1, 1, 1, 1]
	- BBoxes: [<BBox (xmin:176, ymin:16, xmax:415, ymax:265)>, <BBox (xmin:314, ymin:288, xmax:489, ymax:381)>, <BBox (xmin:15, ymin:60, xmax:166, ymax:217)>, <BBox (xmin:46, ymin:240, xmax:155, ymax:384)>, <BBox (xmin:424, ymin:95, xmax:477, ymax:150)>, <BBox (xmin:419, ymin:163, xmax:506, ymax:252)>]
{% endraw %}

Moving On...

Following the Getting Started "refrigerator" notebook...

{% raw %}
# size is set to 384 because EfficientDet requires its inputs to be divisible by 128
image_size = 384  
train_tfms = tfms.A.Adapter([*tfms.A.aug_tfms(size=image_size, presize=512), tfms.A.Normalize()])
valid_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(image_size), tfms.A.Normalize()])

# Datasets
train_ds = Dataset(train_records, train_tfms)
valid_ds = Dataset(valid_records, valid_tfms)
{% endraw %}

this next cell generates an error. ignore it and move on

{% raw %}
samples = [train_ds[0] for _ in range(3)]
show_samples(samples, ncols=3)
{% endraw %} {% raw %}
model_type = models.mmdet.retinanet
backbone = model_type.backbones.resnet50_fpn_1x(pretrained=True)
{% endraw %} {% raw %}
selection = 1


extra_args = {}

if selection == 0:
  model_type = models.mmdet.retinanet
  backbone = model_type.backbones.resnet50_fpn_1x

elif selection == 1:
  # The Retinanet model is also implemented in the torchvision library
  model_type = models.torchvision.retinanet
  backbone = model_type.backbones.resnet50_fpn

elif selection == 2:
  model_type = models.ross.efficientdet
  backbone = model_type.backbones.tf_lite0
  # The efficientdet model requires an img_size parameter
  extra_args['img_size'] = image_size

elif selection == 3:
  model_type = models.ultralytics.yolov5
  backbone = model_type.backbones.small
  # The yolov5 model requires an img_size parameter
  extra_args['img_size'] = image_size

model_type, backbone, extra_args
(<module 'icevision.models.torchvision.retinanet' from '/home/shawley/envs/icevision/lib/python3.8/site-packages/icevision/models/torchvision/retinanet/__init__.py'>,
 <icevision.models.torchvision.retinanet.backbones.resnet_fpn.RetinanetTorchvisionBackboneConfig at 0x7fe3183fa940>,
 {})
{% endraw %} {% raw %}
model = model_type.model(backbone=backbone(pretrained=True), num_classes=len(parser.class_map), **extra_args) 
{% endraw %} {% raw %}
train_dl = model_type.train_dl(train_ds, batch_size=8, num_workers=4, shuffle=True)
valid_dl = model_type.valid_dl(valid_ds, batch_size=8, num_workers=4, shuffle=False)
{% endraw %} {% raw %}
model_type.show_batch(first(valid_dl), ncols=4)
{% endraw %} {% raw %}
metrics = [COCOMetric(metric_type=COCOMetricType.bbox)]
{% endraw %} {% raw %}
learn = model_type.fastai.learner(dls=[train_dl, valid_dl], model=model, metrics=metrics)
{% endraw %} {% raw %}
learn.lr_find(end_lr=1e-3)

# For Sparse-RCNN, use lower `end_lr`
# learn.lr_find(end_lr=0.005)
SuggestedLRs(lr_min=1.4454397023655474e-05, lr_steep=0.00010964782268274575)
{% endraw %} {% raw %}
learn.fine_tune(60, 3e-5, freeze_epochs=2)
epoch train_loss valid_loss COCOMetric time
0 1.306630 1.087196 0.021880 00:44
1 0.915468 0.902527 0.088781 00:44
epoch train_loss valid_loss COCOMetric time
0 0.791113 0.810303 0.180507 01:03
1 0.737577 0.742241 0.267558 01:03
2 0.682668 0.691719 0.322904 01:03
3 0.622674 0.633965 0.384318 01:03
4 0.567769 0.582195 0.439485 01:03
5 0.527422 0.527512 0.493137 01:03
6 0.484775 0.479645 0.525163 01:03
7 0.458943 0.456298 0.531061 01:03
8 0.427073 0.412692 0.578236 01:02
9 0.396412 0.389732 0.586960 01:02
10 0.372663 0.359268 0.628093 01:02
11 0.354193 0.335637 0.655950 01:02
12 0.337307 0.325644 0.658819 01:02
13 0.326799 0.301036 0.666829 01:01
14 0.308090 0.289367 0.673958 01:02
15 0.307322 0.272632 0.696338 01:01
16 0.287230 0.256061 0.717152 01:01
17 0.284758 0.273184 0.661895 01:01
18 0.285536 0.260669 0.670540 01:01
19 0.273382 0.234869 0.732979 01:01
20 0.258657 0.262652 0.668649 01:01
21 0.257409 0.239199 0.707886 01:01
22 0.252329 0.218167 0.746068 01:01
23 0.248076 0.212785 0.750190 01:01
24 0.249072 0.216585 0.722026 01:00
25 0.235517 0.213217 0.729160 01:00
26 0.231704 0.220141 0.714063 01:01
27 0.235016 0.203585 0.749039 01:01
28 0.222970 0.225444 0.699053 01:01
29 0.228616 0.210498 0.719219 01:01
30 0.213365 0.195068 0.754046 01:01
31 0.223220 0.204833 0.729005 01:01
32 0.214508 0.184748 0.772472 01:01
33 0.213924 0.208979 0.716618 01:01
34 0.216720 0.200680 0.726173 01:00
35 0.205956 0.191587 0.755914 01:01
36 0.208912 0.212686 0.702795 01:00
37 0.202199 0.203589 0.704788 01:00
38 0.202491 0.180610 0.760660 01:01
39 0.210989 0.176752 0.773994 01:01
40 0.194234 0.197860 0.728187 01:00
41 0.198444 0.189317 0.731311 01:01
42 0.193839 0.169028 0.784215 01:00
43 0.196560 0.194268 0.720679 01:01
44 0.200109 0.183451 0.755299 01:01
45 0.198458 0.182674 0.743378 01:01
46 0.197688 0.175749 0.762584 01:00
47 0.186225 0.182154 0.746048 01:00
48 0.194730 0.181788 0.741766 01:00
49 0.191977 0.191406 0.718483 01:00
50 0.189743 0.185901 0.736302 01:00
51 0.192683 0.188123 0.728920 01:01
52 0.184791 0.183921 0.740639 01:00
53 0.182162 0.186814 0.732567 01:00
54 0.187391 0.185373 0.737552 01:00
55 0.186228 0.192131 0.719928 01:00
56 0.187785 0.185250 0.734480 01:00
57 0.189979 0.187674 0.730874 01:01
58 0.188858 0.187809 0.730366 01:00
59 0.186192 0.188080 0.729695 01:01
{% endraw %} {% raw %}
model_type.show_results(model, valid_ds, detection_threshold=.5)
{% endraw %}

Inference

{% raw %}
preds = model_type.predict(model, valid_ds, keep_images=True)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-34-9dcbb4ec7382> in <module>
----> 1 preds = model_type.predict(model, valid_ds, keep_images=True)

~/envs/icevision/lib/python3.8/site-packages/icevision/models/torchvision/faster_rcnn/prediction.py in predict(model, dataset, detection_threshold, keep_images, device)
     47 ) -> List[Prediction]:
     48     batch, records = build_infer_batch(dataset)
---> 49     return _predict_batch(
     50         model=model,
     51         batch=batch,

~/envs/icevision/lib/python3.8/site-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
     25         def decorate_context(*args, **kwargs):
     26             with self.__class__():
---> 27                 return func(*args, **kwargs)
     28         return cast(F, decorate_context)
     29 

~/envs/icevision/lib/python3.8/site-packages/icevision/models/torchvision/faster_rcnn/prediction.py in _predict_batch(model, batch, records, detection_threshold, keep_images, device)
     29     batch = [o.to(device) for o in batch]
     30 
---> 31     raw_preds = model(*batch)
     32     return convert_raw_predictions(
     33         batch=batch,

~/envs/icevision/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/envs/icevision/lib/python3.8/site-packages/torchvision/models/detection/retinanet.py in forward(self, images, targets)
    510 
    511         # get the features from the backbone
--> 512         features = self.backbone(images.tensors)
    513         if isinstance(features, torch.Tensor):
    514             features = OrderedDict([('0', features)])

~/envs/icevision/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/envs/icevision/lib/python3.8/site-packages/torchvision/models/detection/backbone_utils.py in forward(self, x)
     42 
     43     def forward(self, x):
---> 44         x = self.body(x)
     45         x = self.fpn(x)
     46         return x

~/envs/icevision/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/envs/icevision/lib/python3.8/site-packages/torchvision/models/_utils.py in forward(self, x)
     61         out = OrderedDict()
     62         for name, module in self.items():
---> 63             x = module(x)
     64             if name in self.return_layers:
     65                 out_name = self.return_layers[name]

~/envs/icevision/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/envs/icevision/lib/python3.8/site-packages/torchvision/ops/misc.py in forward(self, x)
     94         scale = w * (rv + self.eps).rsqrt()
     95         bias = b - rm * scale
---> 96         return x * scale + bias
     97 
     98     def __repr__(self) -> str:

RuntimeError: CUDA out of memory. Tried to allocate 3.52 GiB (GPU 0; 11.91 GiB total capacity; 5.32 GiB already allocated; 2.94 GiB free; 8.21 GiB reserved in total by PyTorch)
{% endraw %} {% raw %}
show_preds(preds=preds[0:10])
{% endraw %} {% raw %}
len(train_ds), len(valid_ds), len(preds)
(832, 209, 209)
{% endraw %} {% raw %}
def get_bblist(pred):
    my_bblist = []
    bblist = pred.pred.detection.bboxes
    for i in range(len(bblist)):
        my_bblist.append([bblist[i].xmin, bblist[i].ymin, bblist[i].xmax, bblist[i].ymax])
    return my_bblist

get_bblist(preds[0])      
{% endraw %} {% raw %}
results = []
for i in range(len(preds)):
    #print(f"i = {i}, file = {str(Path(valid_ds[i].common.filepath).stem)+'.csv'}, bboxes = {get_bblist(preds[i])}, scores={preds[i].pred.detection.scores}\n")
    worst_score = np.min(np.array(preds[i].pred.detection.scores))
    line_list = [str(Path(valid_ds[i].common.filepath).stem)+'.csv', get_bblist(preds[i]), preds[i].pred.detection.scores, worst_score, i]
    results.append(line_list)
    
# store as pandas dataframe
res_df = pd.DataFrame(results, columns=['filename', 'bblist','scores','worst_score','i'])
res_df = res_df.sort_values('worst_score')  # order by worst score as a "top losses" kind of thing
res_df.head() # take a look
{% endraw %} {% raw %}
res_df.to_csv('bboxes_top_losses_fake2.csv', index=False)
{% endraw %}